Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13599-13620, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37459267

RESUMO

Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses versus aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.

2.
IEEE Trans Cybern ; 53(11): 7162-7173, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36264736

RESUMO

So far, researchers have proposed many forensics tools to protect the authenticity and integrity of digital information. However, with the explosive development of machine learning, existing forensics tools may compromise against new attacks anytime. Hence, it is always necessary to investigate anti-forensics to expose the vulnerabilities of forensics tools. It is beneficial for forensics researchers to develop new tools as countermeasures. To date, one of the potential threats is the generative adversarial networks (GANs), which could be employed for fabricating or forging falsified data to attack forensics detectors. In this article, we investigate the anti-forensics performance of GANs by proposing a novel model, the ExS-GAN, which features an extra supervision system. After training, the proposed model could launch anti-forensics attacks on various manipulated images. Evaluated by experiments, the proposed method could achieve high anti-forensics performance while preserving satisfying image quality. We also justify the proposed extra supervision via an ablation study.

3.
IEEE Trans Neural Netw Learn Syst ; 34(5): 2633-2646, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34520365

RESUMO

Scene parsing, or semantic segmentation, aims at labeling all pixels in an image with the predefined categories of things and stuff. Learning a robust representation for each pixel is crucial for this task. Existing state-of-the-art (SOTA) algorithms employ deep neural networks to learn (discover) the representations needed for parsing from raw data. Nevertheless, these networks discover desired features or representations only from the given image (content), ignoring more generic knowledge contained in the dataset. To overcome this deficiency, we make the first attempt to explore the meaningful supportive knowledge, including general visual concepts (i.e., the generic representations for objects and stuff) and their relations from the whole dataset to enhance the underlying representations of a specific scene for better scene parsing. Specifically, we propose a novel supportive knowledge mining module (SKMM) and a knowledge augmentation operator (KAO), which can be easily plugged into modern scene parsing networks. By taking image-specific content and dataset-level supportive knowledge into full consideration, the resulting model, called knowledge augmented neural network (KANN), can better understand the given scene and provide greater representational power. Experiments are conducted on three challenging scene parsing and semantic segmentation datasets: Cityscapes, Pascal-Context, and ADE20K. The results show that our KANN is effective and achieves better results than all existing SOTA methods.

4.
IEEE Trans Image Process ; 31: 2782-2795, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35344493

RESUMO

Human detection and pose estimation are essential for understanding human activities in images and videos. Mainstream multi-human pose estimation methods take a top-down approach, where human detection is first performed, then each detected person bounding box is fed into a pose estimation network. This top-down approach suffers from the early commitment of initial detections in crowded scenes and other cases with ambiguities or occlusions, leading to pose estimation failures. We propose the DetPoseNet, an end-to-end multi-human detection and pose estimation framework in a unified three-stage network. Our method consists of a coarse-pose proposal extraction sub-net, a coarse-pose based proposal filtering module, and a multi-scale pose refinement sub-net. The coarse-pose proposal sub-net extracts whole-body bounding boxes and body keypoint proposals in a single shot. The coarse-pose filtering step based on the person and keypoint proposals can effectively rule out unlikely detections, thus improving subsequent processing. The pose refinement sub-net performs cascaded pose estimation on each refined proposal region. Multi-scale supervision and multi-scale regression are used in the pose refinement sub-net to simultaneously strengthen context feature learning. Structure-aware loss and keypoint masking are applied to further improve the pose refinement robustness. Our framework is flexible to accept most existing top-down pose estimators as the role of the pose refinement sub-net in our approach. Experiments on COCO and OCHuman datasets demonstrate the effectiveness of the proposed framework. The proposed method is computationally efficient (5-6x speedup) in estimating multi-person poses with refined bounding boxes in sub-seconds.

5.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 76-86, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-32750797

RESUMO

In this work, we introduce the average top- k ( ATk) loss, which is the average over the k largest individual losses over a training data, as a new aggregate loss for supervised learning. We show that the ATk loss is a natural generalization of the two widely used aggregate losses, namely the average loss and the maximum loss. Yet, the ATk loss can better adapt to different data distributions because of the extra flexibility provided by the different choices of k. Furthermore, it remains a convex function over all individual losses and can be combined with different types of individual loss without significant increase in computation. We then provide interpretations of the ATk loss from the perspective of the modification of individual loss and robustness to training data distributions. We further study the classification calibration of the ATk loss and the error bounds of ATk-SVM model. We demonstrate the applicability of minimum average top- k learning for supervised learning problems including binary/multi-class classification and regression, using experiments on both synthetic and real datasets.


Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado
6.
IEEE Trans Cybern ; 51(1): 2-15, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31880574

RESUMO

We propose a fast online video pose estimation method to detect and track human upper-body poses based on a conditional dynamic Bayesian modeling of pose modes without referring to future frames. The estimation of human body poses from videos is an important task with many applications. Our method extends fast image-based pose estimation to live video streams by leveraging the temporal correlation of articulated poses between frames. Video pose estimation is inferred over a time window using a conditional dynamic Bayesian network (CDBN), which we term time-windowed CDBN. Specifically, latent pose modes and their transitions are modeled and co-determined from the combination of three modules: 1) inference based on current observations; 2) the modeling of mode-to-mode transitions as a probabilistic prior; and 3) the modeling of state-to-mode transitions using a multimode softmax regression. Given the predicted pose modes, the body poses in terms of arm joint locations can then be determined more accurately and robustly. Our method is suitable to investigate high frame rate (HFR) scenarios, where pose mode transitions can effectively capture action-related temporal information to boost performance. We evaluate our method on a newly collected HFR-Pose dataset and four major video pose datasets (VideoPose2, TUM Kitchen, FLIC, and Penn_Action). Our method achieves improvements in both accuracy and efficiency over existing online video pose estimation methods.

7.
Med Image Anal ; 68: 101878, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33197714

RESUMO

Multimodal image registration is a vital initial step in several medical image applications for providing complementary information from different data modalities. Since images with different modalities do not exhibit the same characteristics, finding their accurate correspondences remains a challenge. For convolutional multimodal registration methods, two components are quite significant: descriptive image feature as well as the suited similarity metric. However, these two components are often custom-designed and are infeasible to the high diversity of tissue appearance across modalities. In this paper, we translate image registration into a decision-making problem, where registration is achieved via an artificial agent trained by asynchronous reinforcement learning. More specifically, convolutional long-short-term-memory is incorporated after stacked convolutional layers in this method to extract spatial-temporal image features and learn the similarity metric implicitly. A customized reward function driven by landmark error is advocated to guide the agent to the correct registration direction. A Monte Carlo rollout strategy is also leveraged to perform as a look-ahead inference in the testing stage, to increase registration accuracy further. Experiments on paired CT and MR images of patients diagnosed as nasopharyngeal carcinoma demonstrate that our method achieves state-of-the-art performance in medical image registration.


Assuntos
Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Humanos
8.
IEEE Trans Image Process ; 27(4): 1809-1821, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29346096

RESUMO

To effectively solve the challenges in object tracking, such as large deformation and severe occlusion, many existing methods use graph-based models to capture target part relations, and adopt a sequential scheme of target part selection, part matching, and state estimation. However, such methods have two major drawbacks: 1) inaccurate part selection leads to performance deterioration of part matching and state estimation and 2) there are insufficient effective global constraints for local part selection and matching. In this paper, we propose a new object tracking method based on iterative graph seeking, which integrate target part selection, part matching, and state estimation using a unified energy minimization framework. Our method also incorporates structural information in local parts variations using the global constraint. We devise an alternative iteration scheme to minimize the energy function for searching the most plausible target geometric graph. Experimental results on several challenging benchmarks (i.e., VOT2015, OTB2013, and OTB2015) demonstrate improved performance and robustness in comparison with existing algorithms.

9.
Nanotechnology ; 28(43): 435204, 2017 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-28786401

RESUMO

Wearable electronics are in high demand, requiring that all the components are flexible. Here we report a facile approach for the fabrication of flexible polypyrrole nanowire (NPPy)/carbon fiber (CF) hybrid electrodes with high electrochemical activity using a low-cost, one-step electrodeposition method. The structure of the NPPy/CF electrodes can be easily controlled by the applied electrical potential and electrodeposition time. Our NPPy/CF-based electrodes showed high flexibility, conductivity, and stability, making them ideal for flexible all-solid-state fiber supercapacitors. The resulting NPPy/CF-based supercapacitors provided a high specific capacitance of 148.4 F g-1 at 0.128 A g-1, which is much higher than for supercapacitors based on polypyrrole film/CF (38.3 F g-1) and pure CF (0.6 F g-1) under the same conditions. The NPPy/CF-based supercapacitors also showed high bending and cycling stability, retaining 84% of the initial capacitance after 500 bending cycles, and 91% of the initial capacitance after 5000 charge/discharge cycles.

10.
IEEE Trans Cybern ; 47(12): 4182-4195, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27875238

RESUMO

Graph-based representation is widely used in visual tracking field by finding correct correspondences between target parts in different frames. However, most graph-based trackers consider pairwise geometric relations between local parts. They do not make full use of the target's intrinsic structure, thereby making the representation easily disturbed by errors in pairwise affinities when large deformation or occlusion occurs. In this paper, we propose a geometric hypergraph learning-based tracking method, which fully exploits high-order geometric relations among multiple correspondences of parts in different frames. Then visual tracking is formulated as the mode-seeking problem on the hypergraph in which vertices represent correspondence hypotheses and hyperedges describe high-order geometric relations among correspondences. Besides, a confidence-aware sampling method is developed to select representative vertices and hyperedges to construct the geometric hypergraph for more robustness and scalability. The experiments are carried out on three challenging datasets (VOT2014, OTB100, and Deform-SOT) to demonstrate that our method performs favorably against other existing trackers.

11.
IEEE Trans Image Process ; 25(8): 3572-84, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-27214901

RESUMO

Recent advances in online visual tracking focus on designing part-based model to handle the deformation and occlusion challenges. However, previous methods usually consider only the pairwise structural dependences of target parts in two consecutive frames rather than the higher order constraints in multiple frames, making them less effective in handling large deformation and occlusion challenges. This paper describes a new and efficient method for online deformable object tracking. Different from most existing methods, this paper exploits higher order structural dependences of different parts of the tracking target in multiple consecutive frames. We construct a structure-aware hyper-graph to capture such higher order dependences, and solve the tracking problem by searching dense subgraphs on it. Furthermore, we also describe a new evaluating data set for online deformable object tracking (the Deform-SOT data set), which includes 50 challenging sequences with full annotations that represent realistic tracking challenges, such as large deformations and severe occlusions. The experimental result of the proposed method shows considerable improvement in performance over the state-of-the-art tracking methods.

12.
IEEE Trans Vis Comput Graph ; 22(12): 2564-2578, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-26761821

RESUMO

Similar objects are ubiquitous and abundant in both natural and artificial scenes. Determining the visual importance of several similar objects in a complex photograph is a challenge for image understanding algorithms. This study aims to define the importance of similar objects in an image and to develop a method that can select the most important instances for an input image from multiple similar objects. This task is challenging because multiple objects must be compared without adequate semantic information. This challenge is addressed by building an image database and designing an interactive system to measure object importance from human observers. This ground truth is used to define a range of features related to the visual importance of similar objects. Then, these features are used in learning-to-rank and random forest to rank similar objects in an image. Importance predictions were validated on 5,922 objects. The most important objects can be identified automatically. The factors related to composition (e.g., size, location, and overlap) are particularly informative, although clarity and color contrast are also important. We demonstrate the usefulness of similar object importance on various applications, including image retargeting, image compression, image re-attentionizing, image admixture, and manipulation of blindness images.

13.
IEEE Trans Pattern Anal Mach Intell ; 38(10): 1983-96, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-26700969

RESUMO

Most multi-object tracking algorithms are developed within the tracking-by-detection framework that consider the pairwise appearance similarities between detection responses or tracklets within a limited temporal window, and thus less effective in handling long-term occlusions or distinguishing spatially close targets with similar appearance in crowded scenes. In this work, we propose an algorithm that formulates the multi-object tracking task as one to exploit hierarchical dense structures on an undirected hypergraph constructed based on tracklet affinity. The dense structures indicate a group of vertices that are inter-connected with a set of hyperedges with high affinity values. The appearance and motion similarities among multiple tracklets across the spatio-temporal domain are considered globally by exploiting high-order similarities rather than pairwise ones, thereby facilitating distinguish spatially close targets with similar appearance. In addition, the hierarchical design of the optimization process helps the proposed tracking algorithm handle long-term occlusions robustly. Extensive experiments on various challenging datasets of both multi-pedestrian and multi-face tracking tasks, demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.

14.
Neural Comput ; 23(11): 2942-73, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21851283

RESUMO

Efficient coding transforms that reduce or remove statistical dependencies in natural sensory signals are important for both biology and engineering. In recent years, divisive normalization (DN) has been advocated as a simple and effective nonlinear efficient coding transform. In this work, we first elaborate on the theoretical justification for DN as an efficient coding transform. Specifically, we use the multivariate t model to represent several important statistical properties of natural sensory signals and show that DN approximates the optimal transforms that eliminate statistical dependencies in the multivariate t model. Second, we show that several forms of DN used in the literature are equivalent in their effects as efficient coding transforms. Third, we provide a quantitative evaluation of the overall dependency reduction performance of DN for both the multivariate t models and natural sensory signals. Finally, we find that statistical dependencies in the multivariate t model and natural sensory signals are increased by the DN transform with low-input dimensions. This implies that for DN to be an effective efficient coding transform, it has to pool over a sufficiently large number of inputs.


Assuntos
Encéfalo/fisiologia , Modelos Neurológicos , Neurônios/fisiologia , Animais , Humanos , Dinâmica não Linear
15.
Neural Comput ; 21(6): 1485-519, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19191599

RESUMO

We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent nongaussian sources. Here, we examine a complementary case, in which the source is nongaussian and elliptically symmetric. In this case, no invertible linear transform suffices to decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial gaussianization (RG), is able to remove all dependencies. We then examine this methodology in the context of natural image statistics. We first show that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filter responses is significantly greater than that achieved by ICA. Finally, we show that the RG transformation may be closely approximated by divisive normalization, which has been used to model the nonlinear response properties of visual neurons.


Assuntos
Dinâmica não Linear , Distribuição Normal , Análise de Componente Principal , Processamento de Sinais Assistido por Computador , Inteligência Artificial , Humanos , Interpretação de Imagem Assistida por Computador
16.
IEEE Trans Pattern Anal Mach Intell ; 31(4): 693-706, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19229084

RESUMO

The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures. Here, we use this local description as a substrate for constructing a global field of Gaussian scale mixtures (FoGSMs). Specifically, we model multi-scale subbands as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for this model is feasible, and that samples drawn from a FoGSM model have marginal and joint statistics similar to subband coefficients of photographic images. We develop an algorithm for removing additive Gaussian white noise based on the FoGSM model, and demonstrate denoising performance comparable with state-of-the-art methods.

17.
Adv Neural Inf Process Syst ; 2008: 1009-1016, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-25328365

RESUMO

We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA or ICA, respectively). Here, we consider a complementary case, in which the source is non-Gaussian but elliptically symmetric. Such a source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of nearby bandpass filter responses, for both sounds and images, are closer to being elliptically symmetric than linearly transformed factorial sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

18.
Artigo em Inglês | MEDLINE | ID: mdl-25346590

RESUMO

In this paper, we describe a nonlinear image representation based on divisive normalization that is designed to match the statistical properties of photographic images, as well as the perceptual sensitivity of biological visual systems. We decompose an image using a multi-scale oriented representation, and use Student's t as a model of the dependencies within local clusters of coefficients. We then show that normalization of each coefficient by the square root of a linear combination of the amplitudes of the coefficients in the cluster reduces statistical dependencies. We further show that the resulting divisive normalization transform is invertible and provide an efficient iterative inversion algorithm. Finally, we probe the statistical and perceptual advantages of this image representation by examining its robustness to added noise, and using it to enhance image contrast.

19.
Proc Natl Acad Sci U S A ; 101(49): 17006-10, 2004 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-15563599

RESUMO

We describe a computational technique for authenticating works of art, specifically paintings and drawings, from high-resolution digital scans of the original works. This approach builds a statistical model of an artist from the scans of a set of authenticated works against which new works then are compared. The statistical model consists of first- and higher-order wavelet statistics. We show preliminary results from our analysis of 13 drawings that at various times have been attributed to Pieter Bruegel the Elder; these results confirm expert authentications. We also apply these techniques to the problem of determining the number of artists that may have contributed to a painting attributed to Pietro Perugino and again achieve an analysis agreeing with expert opinion.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...